Selective web information retrieval

نویسنده

  • Vasileios Plachouras
چکیده

One of the main challenges in Web information retrieval is the number of different retrieval approaches that can be used for ranking Web documents. In addition to the textual content of Web documents, evidence from the structure of Web documents, or the analysis of the hyperlink structure of the Web, can be used to enhance the retrieval effectiveness. However, not all the queries benefit equally from applying the same retrieval approach. An additional challenge is posed by the fact that the Web enables users to seek information by searching and browsing. Therefore, users do not only perform typical informational search tasks, but also navigational search tasks, where the aim is to locate a particular Web document, which has already been visited before, or which is expected to exist. In order to alleviate these challenges, this thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system's input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools

Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...

متن کامل

Behavioral Considerations in Developing Web Information Systems: User-centered Design Agenda

The current paper explores designing a web information retrieval system regarding the searching behavior of users in real and everyday life. Designing an information system that is closely linked to human behavior is equally important for providers and the end users.  From an Information Science point of view, four approaches in designing information retrieval systems were identified as system-...

متن کامل

Comparison of Information Retrieval Capabilities in Library Software of Payam, Voyager and Aleph

The purpose of this study was comparing Information Retrieval Capabilities in Web-based Library Software of Payam, with Voyager and ALEPH. A checklist designed and included six main trait for evaluation and comparing 73 scales. Data collected by experts' observing of the software's OPAC. Data analyzed by the descriptive statistics methods. Findings shows the preferences in search capabilities i...

متن کامل

بازیابی اطلاعات تصویری حوزه‌ی سلامت در وب از دیدگاه متخصصان علوم پزشکی:یک مطالعه کیفی

Introduction: The medical image as a source of non-textual information has an important role in the field of medicine. Since the quality of life is directly related to health, employing this type of information is effective in improving the practice of health professionals. This study was aimed to survey medical image retrieval in the Web from the perspective of experts in medical sciences. M...

متن کامل

Intellectual Structure of Knowledge in Information Behavior: A Co-Word Analysis

Background and Aim: The intellectual structure of knowledge and its research front can be identified by co-word analysis. This research attempts to reveal the intellectual structure of knowledge in information behavior inquiries, via co-word, network analysis, and science visualization tools. Methods: Bibliometric methodology and social network analysis are used. Population comprises 2146 recor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006